{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3450d69f",
   "metadata": {},
   "outputs": [],
   "source": [
    "config = \"./kb.yml\"\n",
    "from txtai.app import Application   \n",
    "\n",
    "app = Application(config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ba905c5c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n",
      "Summary\n",
      " \n",
      "The Information Age has essentially drawn to a close, even though the process of computerizing business, generating data, converting data into information, and distributing it via networks will persist indefinitely. The world has now transitioned into the Digital Age.\n",
      " \n",
      "A critical question confronts enterprise IT: Can legacy software architectures from the Information Age continue satisfying the requirements of the Digital Age? A short answer is NO.\n",
      " \n",
      "Throughout the last three to four decades of information technology evolution, the conventional software engineering approach has endured. However, it has become increasingly apparent that this approach no longer aligns with the demands of the digital age. The era that was inaugurated during the early stages of the mobile internet and the rise of the App economy has now plateaued, exhibiting signs of weariness. Companies grapple with the challenge of maintaining cost-effectiveness in the development, maintenance, and operation of their proprietary applications. IT productivity consistently lags behind market demands, leaving business stakeholders dissatisfied.\n",
      " \n",
      "In reality, IT has persistently grappled with:\n",
      " \n",
      "1. The never-ending cycle of creating or acquiring new systems, leading to the proliferation of information silos and an unceasing demand for connecting them.\n",
      "2. \"Pseudo-agility\" or \"agility theatre\", where IT teams tout adoption of agile methodologies, yet fall short of aligning software functionality delivery with the swift shifts in market dynamics and business needs.\n",
      "3. Inability to achieve proportional IT productivity scalability despite added personnel and budgets, consistently drawing business complaints over imperceptible progress.\n",
      " \n",
      "In the digital age, software and architecture differ significantly from the information age, demanding adaptation for competitiveness. It's time for a breakthrough in IT, where software forms and architectures must evolve beyond the information age.\n",
      " \n",
      "The future of enterprise digital software is SuperApp - a software paradigm for the digital era.\n",
      "\n",
      "\n",
      " \n",
      "\n",
      " \n",
      "Key Concepts\n",
      " \n",
      "HTML5: Encompassing HTML, JavaScript, CSS and related web technologies. For the public it means webpages, for mobile developers it means code snippets using these technologies, usually embedded in apps. \n",
      " \n",
      "App: The standard software form on smartphones since the mobile internet era. Though formats and standards vary on iOS, Android, etc., they are software running on mobile devices, by nature indifferent from their PC counterparts.\n",
      " \n",
      "Native App: Analogous to Windows desktop software developed in C++, Visual Basic, Pascal etc. Mobile apps use languages like Objective-C, Swift, Kotlin, Java, C, C++ supported by OS vendors. \n",
      " \n",
      "Hybrid App: For diverse OSes like iOS and Android, the same business logic needs redeveloping with different tech, necessitating larger teams, longer cycles and inconsistent user experience across devices. Cross-platform solutions emerged using technologies like Flutter, React Native to build app foundations portable across OSes, and HTML5 for adaptable, business-focused, frequently-changed content.\n",
      " \n",
      "SuperApp: A class of Apps like WeChat and Alipay transcend traditional software boundaries by becoming platforms, running millions of 3rd party code known as  Mini-App  or  Mini-program  securely within themselves, covering all aspects of life, harnessing massive business and societal resources, constructing digital worlds where users can largely reside without returning offline. Elon Musk praises such apps and aims to remake Twitter (now X) this way. Gartner's 2023 Tech Trends report dubs them SuperApp - no longer exclusive to internet giants and now entering enterprises.\n",
      " \n",
      "Mini-App/Mini-program: Around 2017, WeChat introduced \"Mini-program\" technology. This was instrumental to WeChat becoming a SuperApp - Mini-programs have low development barriers, great mobile user experience, and can harness resources across society, serving as an enabler of \"digitalization\". Subsequently, other Internet giants followed suit, developing similar Mini-program syntax, formats and runtimes, releasing their own Mini-program technologies to compete for internet ecosystem share. By leveraging societal resources through Mini-programs, these platforms aim to emulate WeChat's success as a SuperApp. In 2021, W3C formed a group to standardize Mini-programs and called them Mini-Apps. In this paper,  Mini-App  and  Mini-program  terms are used interchangeably.\n",
      " \n",
      "W3C Mini-App Workgroup: With the tremendous success of Mini-programs, more vendors participate, establishing this W3C group to standardize Mini-programs.\n",
      " \n",
      "SuperApp Enabling Tech for Enterprise: Internet giants have obtained huge business value through their proprietary Mini-program technologies and centrally controlled SuperApp platforms. In contrast, enterprises across industries and even government agencies have been relegated to becoming passive ecosystem members on these external SuperApp platforms. Commercial organizations and government bodies long for similar technologies to attain autonomy - to assume platform operator roles, construct and govern their own digital ecosystems, accumulate their own data, and safeguard operational and customer data privacy. Fortunately, SuperApp technologies have now been standardized for enterprise adoption, enabling organizations to emulate the practices of internet SuperApps, independently harnessing Mini-Apps to build their own digital platforms and ecosystems.\n",
      " \n",
      "FinClip Mini-App Spec: The Mini-program technology independently defined and developed by FinoGeeks maintains compatibility with mainstream internet Mini-programs in formats, interfaces, and specifications. This enables enterprise IT to reuse existing technical investments and talent, as well as leverage the sizeable stock of Mini-program content resources in society. It also supports related W3C standards, empowering enterprises through standardized technologies.\n",
      " \n",
      "FinClip Mini-program Runtime: FinoGeeks, currently the sole independent vendor in this domain, commenced developing comprehensive Mini-program technology in 2019. It enables enterprises of any scale to cost-effectively obtain capabilities previously exclusive to internet giants, building their own SuperApps and digital ecosystems.\n",
      " \n",
      "FinClip Security Sandbox: All Mini-program code is downloaded from the internet and executed on-demand. To ensure that this code poses no potential security risks to user devices, FinClip employs a secure sandbox technology within its runtime environment to isolate and execute Mini-program code. This sandbox also isolates any communication or sharing between Mini-programs.\n",
      " \n",
      "FinClip SDK: FinoGeeks encapsulates the aforementioned Mini-program runtime and secure sandbox technology within an SDK. This SDK offers support for various operating systems, including iOS, Android, Windows, Mac, Linux, and more. To enable the capability of running Mini-programs within any existing app, integration of this SDK requires just a few lines of code.\n",
      " \n",
      "FinClip SuperApp: A SuperApp built with the FinClip SDK, capable of operating unlimited Mini-Apps/Mini-programs. \n",
      " \n",
      "Host App: In relation to Mini-programs, an App whether it is entirely native or hybrid in nature becomes a host for running Mini-programs once integrated with the FinClip SDK. This can be likened to virtualization technology, where the virtual machine is the guest, and the operating system running the virtual machine is the host. Mini-programs \"reside\" within the host. FinoGeeks' FinClip technology extends the concept of \"host\" to PC operating systems, IoT devices, and more. Therefore, strictly speaking, the host is not limited to mobile Apps, although in this context, unless specified otherwise, it can be assumed to refer to regular Apps.\n",
      " \n",
      "Mini-program container technology: isolates each Mini-program within its secure sandbox, allowing individual instances to run independently in their dedicated memory, threads, and storage and making them invisible to one another, akin to placing each Mini-program instance within its own \"container\". This containerization is key to achieving economies of scale for SuperApps.\n",
      " \n",
      "Mini-program Runtime Lifecycle: When a Mini-program instance is loaded into the host, its various states such as initialization, visibility, invisibility, and destruction are collectively referred to as the \"Lifecycle.\" State transitions are typically triggered by user interactions within the host.\n",
      " \n",
      "Mini-program Publishing Lifecycle: The various stages a Mini-program goes through, including development, testing, grayscale/canary release (partial deployment), production release (full deployment), and termination, collectively constitute its \"publishing status.\"\n",
      " \n",
      "Host Lifecycle: For mobile devices like smartphones and tablets, the host app lifecycle is typically associated with the mobile operating system and the app stores operated by mobile manufacturers. It's important to note that the \"Mini-program publishing lifecycle\" is entirely separate from the host's publishing lifecycle. Mini-program developers and host developers may have no connection, which is a key factor in achieving SuperApp economies of scale (as discussed later).\n",
      " \n",
      "Zero Trust: A security model first proposed by Forrester Research analyst John Kindervag in 2010, departing from traditional IT security models focused on protecting network perimeters and assuming everything inside is trusted. Zero trust presumes no implicit trust in any person or device inside or outside networks.\n",
      "\n",
      "  \n",
      "\n",
      "\n",
      " \n",
      "App 1.0   Software Paradigm in the Information Age\n",
      " \n",
      "The information age represented the transition from the isolated computerization age to an interconnected age enabling information exchange.\n",
      " \n",
      "1. Computerization or Electronization: In this phase, exemplified by banks, various business processes are gradually transformed into software systems. During this period, software systems primarily run on isolated, non-networked computing devices (such as Mainframes). At this stage, there is only data, and information is yet to be fully realized.\n",
      "\n",
      "2. Informatization: With the emergence of network technologies, the 1990s saw the gradual formation of a global internet. Previously isolated computers now transmit their computational results over networks, transforming them into information. During the Web 1.0 era, the internet connected these computers with remote users, breaking geographical barriers and facilitating interactions. Information primarily flowed from the inside to the outside.\n",
      "\n",
      "The advent of mobile internet and the rise of the app economy haven't fundamentally altered the software development models in use since the mainframe, Client/Server, and PC eras. Enterprise IT still adheres to old paradigms in terms of development philosophy and technical architecture. As a result, app development inherits most of the traditional software development problems.\n",
      "\n",
      "Software Planned Economy Results in Agile Theatre\n",
      "Software development guided by the \"planned economy\" mindset of the information age cannot be truly agile. The one-way, inside-out development model entails internal business departments initiating feature requests, then IT analysing and planning those features, either building them internally per plan or outsourcing development, followed by repetitive in-house testing cycles. After bundled features and fixes are deemed ready, IT packages a release and submits it to third-party app stores, with the goal of getting external users to download and install. This unidirectional approach is extremely slow and clumsy with high time costs, coarse granularity, and inability to meet urgent business needs to reach markets swiftly. IT s self-proclaimed  agility  doesn t help. \n",
      "\n",
      "The digital age calls for fine-grained capabilities, fast development and release, rapid low-cost trial-and-error, and completely controllable risks. Software is \"utility computing\" - users utilise on-demand then move on. The \"planned economy\" mindset cannot satisfy these demands.\n",
      "\n",
      "Waste IT Cycles of infinite connect, disconnect, connect \n",
      "IT oscillates between perpetually \"creating information silos\" and attempting to \"connect them\". New systems are continually acquired and developed, with each addition resulting in more silos. Past integration efforts using ESBs (Enterprise Service Buses), messaging middleware etc. were costly and minimally effective. Business stakeholders often cannot grasp or quantify the benefits, thus remain dissatisfied. Enterprises are forever tackling information silos.\n",
      "\n",
      "Digitalization necessitates people, scenarios, activities, processes and transactions being online with intersystem interconnection. Information silos run counter to digitalization's fundamental prerequisite of openness and connectivity.\n",
      "\n",
      "Lacking of Operational Flexibility\n",
      "IT-developed apps often lack what we could term \"operational flexibility\" from the business perspective. In fact, over the past three to four decades of computerization and informatization, not just apps - most software systems have faced similar inflexibility, unable to meet business units' operational needs.\n",
      "\n",
      "Many software systems attempt to address this by implementing mechanisms like \"management backends,\" which allow business users some degree of control, enabling them to configure parameters and rules independently, without needing to request IT's assistance for every new requirement that wasn't considered during the initial software design phase. However, these \"management backend\" mechanisms often lack the flexibility needed for applications primarily aimed at external customers or partners (especially mobile apps). They struggle to meet the demands of timely and rapidly changing operational needs.\n",
      "\n",
      "In simpler terms, apps that continue to follow the software mindset of the information age often fall short of embracing the \"digitally operational\" mindset.\n",
      "\n",
      "The requirements of the digital age are characterized by a strong demand for high timeliness, the need to reach users via the internet, and a business department's reluctance to wait for IT's development schedules. Business users aspire to have control over the creation and publication of digital content, especially in marketing-focused business lines where direct customer engagement through company software becomes essential.\n",
      "\n",
      "IT Productivity Cannot Scale\n",
      "IT perennially grapples with understaffing, leaving business departments perpetually dissatisfied with IT support. Even with limitless budgets and personnel, can IT genuinely deliver linear or near-linear productivity growth relative to human resources invested? This perennial question leaves business-side and senior management perplexed about where all the IT investment goes.\n",
      "\n",
      "In the realm of software development, throwing in more people doesn t necessarily translate to heightened productivity. What is the root cause?\n",
      "\n",
      "\n",
      "\n",
      "It lies in the technical architecture. Traditional IT often employs tightly-coupled architectures, similar to a confined physical space in a small house where additional personnel offer limited manoeuvrability. Mobile apps, constrained by device form factors, exemplify such \"palm-sized spaces\" where adding engineers proves unproductive.\n",
      "\n",
      "The digital age necessitates businesses to embrace loosely-coupled technical architectures, where \"Lego\"-like functional modules dynamically assemble to meet customer needs. These modules possess low or no interdependence, enabling disparate individuals, teams, and departments to work independently in parallel. Apps become portals for assembling prefabricated components, embodying the ethos of \"more hands, lighter work.\"\n",
      "\n",
      "Self-imposed limitation obstructing true openness\n",
      "Enterprises are habitually conditioned to presume rigid software boundaries, struggling with openness. Stuck in information age mindsets, most enterprises still think along the lines of \"initiating internal business requests\" - \"planning and designing systems\" - \"outsourcing to developers to build and test\" - \"conducting users acceptance test to obtain online approval from business units\"   \"upgrading entire system to go live\". This inside-out, one-way thinking strongly assumes software serving customers must be natively developed and operated by the enterprise itself. Though many tout \"openness\", it's mostly lip service. How to open up? What signifies openness? How to ensure a secured openness? It s easy said than done but devil is in the details. Where are the necessary tools, best practices and actual implementation solutions?\n",
      " \n",
      "Digitalization necessitates disrupting traditional physical boundaries, redefining digital edges. Digitally advanced firms rethink internal/external network boundaries - internal software cannot presume safety, while external services must ensure security. Firms must utilize technical intermediaries (telcos, internet traffic platforms, third parties) to reach customers externally, while enabling trusted exchange of digital resources (source code) with partners internally. \n",
      " \n",
      "Enterprise software development, procurement, and operations are no longer necessarily entirely in-house activities. In the digital age, embracing interconnectedness and symbiotic relationships between organizations is indispensable for enterprises to thrive.\n",
      " \n",
      "\n",
      "\n",
      "\n",
      "App 2.0 - Software Forms in the Digital Age\n",
      " \n",
      "The aforementioned problems have been aptly resolved, specifically manifest in SuperApps. \n",
      " \n",
      "The SuperApp tech trend spearheaded by internet giants has actualized the concepts and user experiences of Web 2.0 that traditional enterprise IT largely missed out on. It's now opportune to institute similar practices in traditional enterprises. Let's delve into how the various problems associated with the earlier Informatization era's Software 1.0 can be addressed one by one.\n",
      "\n",
      "\n",
      "Market Economy Driven Software Paradigm\n",
      "SuperApps with Mini-programs represent a \"market economy\" software development paradigm that architecturally enables dynamic fulfilment of unpredictable, challenging-to-plan business needs. \n",
      "\n",
      "Business demands are endless while market conditions rapidly shift. But the one-way thinking of traditional IT makes ensuring business innovation and time-to-market extremely difficult, possibly due to:\n",
      "\n",
      "1. Business functional requirements may be uncertain, with trial-and-error aspects. IT may overinvest in potentially misguided efforts, proven wasteful months later upon market launch.\n",
      "\n",
      "2. Analysis by personnel of business requirements also risks being incomplete or misguided. In the software 1.0 era, IBM recommended dedicating 50% of project time to requirements analysis, but few enterprises actually achieve this. Incorrect analysis, coupled with typical enterprise communication disconnects and misalignments, lead to deviation between software intent and implementation.\n",
      "\n",
      "3. Months-long implementation schedules hinder responding to market changes, rendering previously reasonable designs ineffective upon launch.\n",
      "\n",
      "The solution is controlling capability granularity - develop one capability and immediately release it, making business capability releases extremely affordable. This enables rapid development and staged rollouts with quick experimentation in controlled scopes through rapid iterations and rapid launch.\n",
      "\n",
      "SuperApp enabling technologies, capable of aggregating unlimited number of fine-grained business scenarios encapsulated in Mini-programs, provide such a solution. The completely decoupled yet complementary SuperApp \"Host Lifecycle\" and Mini-program \"Publishing Lifecycle\" enable long-term host stability with frequently-changing Mini-program development, testing and release. Mainstream internet SuperApps have proven this flexibility - the largest platforms release hundreds of thousands of Mini-programs with millions of publishing and removals daily without issues.\n",
      "\n",
      "Moreover, SuperApp operators do not need to predict what future business capabilities to support. The Mini-program mechanism mobilizes societal resources across industries for endless digital scenarios and developer contributions. New, unplanned and innovative business capabilities will just appear seamlessly. Compared to software 1.0's \"Planned Economy\" driven engineering paradigm, this exemplifies \"Market Economics\". In fact, the mobile internet era's app stores pioneered such bona fide software market dynamics, now ripe for enterprises.\n",
      "\n",
      "An enterprise operating its own Mini-programs Marketplace actualizes digital ecosystem and modularization, delivering on-demand granular capabilities, powering true agility.\n",
      "\n",
      "\n",
      " Core Business+  Strategy \n",
      "SuperApp with Mini-programs, akin to the 'Internet+' strategy, serve as a core capability that transcends various application scenarios, bridging information silos. For instance, a bank can employ its core capabilities, such as accounts, KYC, credit, and payments, to establish a 'Banking+' strategy. Through a banking SuperApp, it can leverage these core capabilities and introduce various 'Mini-program' commercial scenarios sitting on top of the core capability APIs. If the Mini-programs are provided by external 3rd parties partners, then bank is now essentially orchestrating a very viable Open Banking strategy. \n",
      "\n",
      "While this approach doesn't entirely solve the problem of 'information silos' inherited from the information age   enterprise IT may still require message middleware, enterprise service buses, etc., to integrate internal systems   using the SuperApp technology platform to connect multiple scenarios and parties represents another means of breaking down information silos. This approach is often the most tangible and direct way for business departments, partners, and customers to perceive improved digital capabilities. Users' most immediate experience of a company's 'digital' prowess is having a single app that can handle all their business needs without going offline.\n",
      "\n",
      "Better yet, the SuperApp approach is essentially leveraging the Internet itself as the service bus to connect services from both internal systems and external partner systems for a customer.\n",
      "\n",
      " \n",
      "IT Is the Platform While Business Run on The Platform\n",
      "SuperApp with Mini-programs can form a highly operable business platform. With such a platform, enterprises can appoint operational teams or even establish dedicated digital operations departments as owners and managers. They are responsible for orchestrating the collaboration among internal business departments, industry chain partners, and third-party collaborators, effectively acting as a platform operator. This means promoting and enticing internal and external parties to 'join' the platform, transforming their business scenarios into Mini-programs, and providing Mini-program content to the platform.\n",
      "\n",
      "In this context, if business departments and partners are the 'publishers' of Mini-program content, the platform itself becomes the 'distributor' responsible for the final review, management of listings, and operations on the platform's associated apps (such as placement, user recommendations, and search functionality).\n",
      "\n",
      "Boost IT Productivity with Elasticity\n",
      "SuperApp with Mini-programs enable large-scale parallel development within enterprises, eliminating the need for 'scheduling' business department demands. Enterprise IT should contemplate and draw lessons from why certain internet companies' SuperApps can seamlessly accommodate millions of developers working concurrently without interference with each other, allowing them frequent uploading and removing of various content. If a company's IT possesses such capabilities, could it organize numerous teams to develop in parallel and address various demands from different business departments simultaneously? \n",
      "\n",
      "Imagine a scenario where the CEO states that leveraging technology to support business is top priority and in urgent need, willing to allocate funds and resources immediately but wanting to see results as quickly as possible. IT can confidently pledge to increase productivity almost linearly for instance, by temporarily augmenting external engineering talent to achieve the kind of resource elasticity akin to cloud services.\n",
      "\n",
      "\n",
      "These abilities are imperative for any enterprise IT, not just internet BigTech companies. Sometimes the greatest concern isn't a lack of funds or personnel but rather investing resources without visible IT outputs.\n",
      "\n",
      "Openness For Better Leverage\n",
      "SuperApp with Mini-programs, in their open and integrated approach, redefine a company's digital boundaries. Enterprises provide services to customers and partners through software, but this 'software' no longer necessarily originates from the enterprise itself. The integration and exchange of online digital resources may not always be IT-driven; it's increasingly a business endeavour. Business units can introduce external partners, launch their digital services, leverage third-party services for their customers, and conduct transactions through revenue sharing or resource exchange. This process often doesn't involve IT departments procuring third-party digital services, as these third-parties may not even be software providers and don't meet the conditions for selling software.\n",
      "\n",
      "Furthermore, lightweight software in Mini-program form becomes an online digital resource that can be forwarded, shared, circulated and flowed to customers. It runs within an enterprise's app, but this doesn't imply that its source code was developed or procured by the enterprise's IT department for deployment.\n",
      "\n",
      "In summary, a SuperApp is not a simple replacement for traditional apps; it's fundamentally an open platform that is made possible with Mini-programs. With robust security mechanisms, IT technically operates and maintains the platform while business departments handle 'business development' on the platform by attracting partners and facilitating collaborations, by bringing in partners  Mini-programs to the company s self-hosted  Mini-program store .\n",
      "\n",
      "IT becomes the platform itself.\n",
      " \n",
      "Conclusion: Capabilities of App 2.0\n",
      "These capabilities constitute the foundations of next-gen enterprise software essential for digital transformation across industries in the post-mobile internet era. \n",
      " \n",
      "The emergence of generative AI technology may herald the arrival of a software 3.0 era, but has this era truly arrived? According to Andrej Karpathy, Director of AI at Tesla and former OpenAI scientist, it hasn't yet. Generative AI with natural language programming capabilities introduces a new layer of abstraction, allowing you to instruct it to generate software using human language. However, these generated artifacts remain fundamentally 2.0 products.\n",
      "\n",
      "The technological architecture of SuperApps along with Mini-programs embodies a divide-and-conquer algorithmic approach.  It is sensible to currently focus on judiciously leveraging it to decompose immense complex business problems into isolated pieces, applying low-code/no-code and generative AI locally. This pragmatic targeting of generative AI on component problems within the overarching SuperApp + Mini-program architecture may prove more realistic.\n",
      "\n",
      "\n",
      "\n",
      " \n",
      "Working Theory Behind SuperApps\n",
      " \n",
      "SuperApp technology is not a simple \"replacement\" for traditional apps. At minimum, it comprises:\n",
      "\n",
      "1. Client-side: FinClip SDK by FinoGeeks (see definitions) embeds SuperApp frontends into existing mobile apps on iOS, Android, etc, as well as Linux, Windows, Mac OS, and even IoT and low-power device screens, enabling ubiquitous Mini-programs.\n",
      "2. Mini-Apps Store/Marketplace: This is a tool for enterprises to deploy and operate their own Mini-program management system. It allows companies to centrally manage Mini-program content from both their internal teams and external partners, including content review, publishing, retraction, and monitoring. The external digital ecosystem of the enterprise relies on this system for its operation. Additionally, the most important user-facing function is to provide mechanisms for discovering and showcasing Mini-program content, such as user access through search, recommendations, rankings, and more.\n",
      "3. Mini-App publishing, staged rollout, and developer portal: SuperApp serves as an open platform where developers can register developer accounts, upload, update, and retract their developed Mini-program content. Developers can also set rules and strategies for gradual releases to control the scope of content deployment and the target user groups.\n",
      "4. Data monitoring tools: Collect and analyze Mini-program runtime and usage data within the SuperApp to help enterprises optimize operational strategies.\n",
      "\n",
      "The key points are the client-side SDK to embed Mini-program capabilities everywhere, the enterprise marketplace to manage Mini-programs, developer portals for creating Mini-programs, and data tools to optimize SuperApp ecosystem operations.\n",
      " \n",
      "Moreover, to facilitate the streamlined creation of Mini-program-style digital content by enterprise IT and its partners, FinoGeek's FinClip technology also offers tools such as FinClip Studio and FinClip Browser. These tools make it convenient for developers to create and debug Mini-programs.\n",
      " \n",
      "SuperApp technology is Cloudification\n",
      " \n",
      "\"Cloudifying\" app capabilities on client devices transcends \"form factor\" constraints, unshackling IT productivity. As discussed earlier, the traditional app confined to the phone s minuscule palm-sized space\" can be likened to a 10 sqft. room - adding more IT engineers is unavailing in the cramped space, more business requests still await in a serialised long queue pending for sequential IT development. When faced with especially urgent and crucial business demands, the business departments, as particular software feature \"owners,\" often have to approach IT to increase priority levels or even seek intervention from company senior management. In such a congested space, the idea of openness becomes almost impossible.\n",
      " \n",
      "\n",
      "The solution entails retaining basic, core, stable functionality (e.g. biometrics, video, digital payments, stock trading etc.) in the 10 sqft. room, while shifting business scenario content to the unbounded space of the cloud. Mini-programs make this achievable - each independently developed, debugged, published, monitored on the cloud-side, then seamlessly loaded into an device-side App when required. This architecture enables a constrained device-side host app in the minuscule space to accommodate an unlimited number of teams and individuals offering endless business possibilities.\n",
      " \n",
      "SuperApp technology is Zero-Trust\n",
      "\n",
      "As elucidated earlier, enterprise digital boundaries are redefined in the digital era, with interconnectedness and openness between an enterprise and partners epitomizing digitalization. No bona fide platforms or digital ecosystems exist sans openness. So how do successful internet SuperApps open up and ingest digital content including source code from countless entities devoid of security qualms? The crux is Zero Trust - trust nobody s code and always employing security sandboxing to sequester partner source code (Mini-programs), insulating interactions between Mini-program instances, insulating communication between Mini-programs and their host app and insulating Mini-program access to local device resources.\n",
      " \n",
      "In truth, no enterprise IT team or software project today can eschew depending on open-source technologies, hence cannot presume the nested layers of open-source dependencies are devoid of vulnerabilities or tainted code. The most prudent strategy is for enterprise SuperApp platforms to espouse zero trust irrespective of any partner or even their own IT.\n",
      "\n",
      "\"Zero trust\" permits secure openness to construct digital ecosystems. (Note: this borrows concepts from the Zero Trust security model - refer to definitions at the outset.)\n",
      "\n",
      " \n",
      "\n",
      "\n",
      "\n",
      "Economies of Scale for Digital Transformation\n",
      " \n",
      "Today, across various industries, there is a push for the digital economy and digital transformation. While concepts and theories abound, how can enterprises concretely put the slogan of \"digital transformation\" into practice? How can they make their customers, partners, employees palpably feel it? In a sense, it means entering \"App 2.0\" by embracing secure openness and ecosystem interconnections to codify everything into software. \"Software is eating the world\", Marc Andreessen proclaimed in the Wall Street Journal on August 20, 2011. \n",
      "\n",
      "But  Talk is cheap. Show me the code.\"\n",
      " \n",
      "SuperApp technology is exactly such a  killer technology  that renders  digitalization  intuitively perceptible and catalyzes snowball effects for enterprise digital transformation   just consider how digitalization has climbed to current heights in China, made tangible and accessible to the general public, with some  national-grade  or so-called  citizen  SuperApps playing decisive roles. Government and enterprises should emulate such successful practices within their own vertical realms.\n",
      " \n",
      "The cardinal reason SuperApp technology can accelerate digitalization is by virtue of the aforementioned  cloudification  and  zero trust  mechanisms, it makes digitalization of discrete scenarios highly viable. The technical platform achieves economy of scale, equivalent to mobilizing an unlimited number of societal resources online, thereby further accelerating the online transformation of more societal resources. In comparison, traditional apps on smartphones cannot achieve this because, for the vast majority of enterprises, the professional threshold and economic cost of developing, operating, and running an app themselves are much higher than those of a Mini-program running within a SuperApp.\n",
      " \n",
      "Quantitative change drives qualitative change   when social resources and business scenarios become highly easy to codify, extremely rich and diverse software ecosystems emerge, eventually encompassing everything, furnishing consumers nearly all required services in the virtual realm. Is this not the most palpable digitalization?\n",
      " \n",
      "Now, SuperApp technology no longer remains exclusive privilege of any internet giants or BigTech companies. It is time for commercial and government entities to adopt this enabling technology to build their interlinked world with supply chain partners, online customers, and suitable societal resources, and to position themselves as platform owners to function as platforms within their own business realms. From the perspective of openness, interconnection and ecosystem, SuperApps are the most efficacious medium for numerous institutions to undertake their own digitalization.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "docs1 = \"../../../finclip-kb/source/SuperAppAge_EN.txt\"\n",
    "with open(docs1, \"r\") as file:\n",
    "    text = file.read()\n",
    "\n",
    "print(text)  # The whole file content as a single string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e628e376",
   "metadata": {},
   "outputs": [],
   "source": [
    "from markitdown import MarkItDown"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d649855",
   "metadata": {},
   "outputs": [],
   "source": [
    "md = Markitdown()clean_text = md.format(raw_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2b2c4362",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/cliang/miniconda3/envs/jupyterlab/lib/python3.12/site-packages/txtai/pipeline/data/htmltomd.py:55: MarkupResemblesLocatorWarning: The input looks more like a filename than markup. You may want to open this file and pass the filehandle into Beautiful Soup.\n",
      "  soup = BeautifulSoup(html, features=\"html.parser\")\n"
     ]
    }
   ],
   "source": [
    "# Extract segments using the textractor pipeline\n",
    "segments = list(app.pipelines[\"textractor\"](docs1))\n",
    "\n",
    "# Create documents with proper structure (following your CLI approach)\n",
    "documents = []\n",
    "for i, text in enumerate(segments):\n",
    "    doc_id = f\"data_science_{i}\"\n",
    "    documents.append({\n",
    "        \"id\": doc_id,\n",
    "        \"text\": text,\n",
    "        \"metadata\": {\n",
    "            \"source\": docs1,\n",
    "            \"index\": i,\n",
    "            \"total\": len(segments)\n",
    "        }\n",
    "    })\n",
    "\n",
    "# Add documents to the index\n",
    "app.add(documents)\n",
    "app.index()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "847c9c06",
   "metadata": {},
   "outputs": [],
   "source": [
    "queries = [\n",
    "    \"What programming languages are used in data science?\", \n",
    "    \"What is AutoML?\", \n",
    "    \"What causes poor model reliability in data science? \",\n",
    "    \"What tools are used for data visualization?\"]\n",
    "\n",
    "results = app.search(queries[3])\n",
    "for result in results:\n",
    "    print(f\"Score: {result['score']:.4f}\")\n",
    "    print(f\"Text: {result['text'][:200]}...\")  # Print first 200 chars\n",
    "    print(\"-\" * 80)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8245f6bd-eea3-4db4-8965-b52f860cff98",
   "metadata": {},
   "outputs": [],
   "source": [
    "from txtai import Embeddings\n",
    "from txtai.pipeline import Textractor\n",
    "\n",
    "# Create textractor model\n",
    "textractor = Textractor(\n",
    "    paragraphs=True,\n",
    "    backend=\"text\"\n",
    ")\n",
    "\n",
    "# Create an embeddings\n",
    "embeddings = Embeddings(path=\"sentence-transformers/nli-mpnet-base-v2\")\n",
    "input_docs = \"/Users/cliang/repos/finclip-kb/source/SuperApp_whitepaper_EN.pdf\"\n",
    "input_docs2 = \"../test/knowledgebase/data_science.md\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56d615e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an index for the list of text\n",
    "data = [\n",
    "  \"US tops 5 million confirmed virus cases\",\n",
    "  \"Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg\",\n",
    "  \"Beijing mobilises invasion craft along coast as Taiwan tensions escalate\",\n",
    "  \"The National Park Service warns against sacrificing slower friends in a bear attack\",\n",
    "  \"Maine man wins $1M from $25 lottery ticket\",\n",
    "  \"Make huge profits without work, earn up to $100,000 a day\"\n",
    "]\n",
    "\n",
    "embeddings.index(input_docs2)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2af15189",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"%-20s %s\" % (\"Query\", \"Data science\"))\n",
    "print(\"-\" * 50)\n",
    "\n",
    "# Run an embeddings search for each query\n",
    "for query in (\"feel good story\", \"Data science\"):\n",
    "  # Extract uid of first result\n",
    "  # search result format: (uid, score)\n",
    "  uid = embeddings.search(query, 1)[0][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88dd903c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Debug the process\n",
    "import os\n",
    "print(f\"File exists: {os.path.exists(input_docs)}\")\n",
    "result = list(textractor(input_docs))\n",
    "print(f\"Extracted {len(result)} text chunks\")\n",
    "if result:\n",
    "    print(\"First chunk:\", result[0][:100], \"...\")\n",
    "else:\n",
    "    print(\"No text extracted\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e5bdcfc-6c19-4c20-974a-0bb5095c2f5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "for paragraph in textractor(input_docs):\n",
    "  print(paragraph, \"\\n----\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e81b28d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "textractor = Textractor(sections=True)\n",
    "print(\"\\n[PAGE BREAK]\\n\".join(section for section in textractor(input_docs)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "2a99b60a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from txtai.app import Application\n",
    "\n",
    "app = Application(\"./kb.yml\")\n",
    "app.add(input_docs2)\n",
    "app.index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b381d3c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from txtai.app import Application\n",
    "\n",
    "# Create and run application\n",
    "app = Application(\"path: ./.txtai/kb-test\")\n",
    "# Access the embeddings object\n",
    "embeddings = app.embeddings\n",
    "\n",
    "# Print the full embeddings configuration in a more readable format\n",
    "import json\n",
    "print(json.dumps(embeddings.config, indent=2))\n",
    "\n",
    "# Check if graph is available\n",
    "print(f\"\\nHas Graph: {hasattr(embeddings, 'graph')}\")\n",
    "if hasattr(embeddings, 'graph') and embeddings.graph:\n",
    "    print(f\"Graph Node Count: {embeddings.graph.count()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3617b94",
   "metadata": {},
   "outputs": [],
   "source": [
    "app.search(\"what is data science\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "a265401b",
   "metadata": {},
   "outputs": [],
   "source": [
    "graph = app.search(\"SuperApp\", limit=100, graph=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c1312c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import networkx as nx\n",
    "\n",
    "def plot(graph):\n",
    "    labels = {x: f\"{graph.attribute(x, 'id')} ({x})\" for x in graph.scan()}\n",
    "    options = {\n",
    "        \"node_size\": 750,\n",
    "        \"node_color\": \"#0277bd\",\n",
    "        \"edge_color\": \"#454545\",\n",
    "        \"font_color\": \"#fff\",\n",
    "        \"font_size\": 6,\n",
    "        \"alpha\": 1.0\n",
    "    }\n",
    "\n",
    "    fig, ax = plt.subplots(figsize=(17, 8))\n",
    "    pos = nx.spring_layout(graph.backend, seed=0, k=0.9, iterations=50)\n",
    "    nx.draw_networkx(graph.backend, pos=pos, labels=labels, **options)\n",
    "    ax.set_facecolor(\"#303030\")\n",
    "    ax.axis(\"off\")\n",
    "    fig.set_facecolor(\"#303030\")\n",
    "\n",
    "    plt.show()\n",
    "\n",
    "plot(graph)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e64f5b67",
   "metadata": {},
   "outputs": [],
   "source": [
    "for x in list(graph.centrality().keys())[:20]:\n",
    "    print(graph.node(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44df7d1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from txtai.pipeline import Extractor\n",
    "\n",
    "# Create extractor instance\n",
    "extractor = Extractor(embeddings, \"distilbert-base-cased-distilled-squad\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12e5d98e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries if not already imported\n",
    "from txtai.pipeline import Extractor\n",
    "\n",
    "# Create extractor instance using your existing embeddings\n",
    "extractor = Extractor(embeddings, \"distilbert-base-cased-distilled-squad\")\n",
    "\n",
    "# Read the data_science.md file\n",
    "with open(\"../../test/knowledgebase/data_science.md\", \"r\") as f:\n",
    "    data_science_text = f.read()\n",
    "\n",
    "# Split the document into paragraphs for better extraction\n",
    "from txtai.pipeline import Textractor\n",
    "textractor = Textractor(paragraphs=True)\n",
    "paragraphs = list(textractor(data_science_text))\n",
    "\n",
    "print(f\"Document split into {len(paragraphs)} paragraphs\")\n",
    "\n",
    "# Define questions for different search types\n",
    "# 1. Keyword-based questions (direct term matching)\n",
    "keyword_questions = [\n",
    "    \"What programming languages are used in data science?\",\n",
    "    \"What is AutoML?\",\n",
    "    \"What tools are used for data visualization?\"\n",
    "]\n",
    "\n",
    "# 2. Similarity-based questions (semantic understanding)\n",
    "similarity_questions = [\n",
    "    \"How can data scientists ensure their models are fair?\",\n",
    "    \"What steps are involved in preparing raw data for analysis?\",\n",
    "    \"How can organizations derive business value from data science?\"\n",
    "]\n",
    "\n",
    "# 3. Relationship-based questions (graph connections)\n",
    "relationship_questions = [\n",
    "    \"How does feature engineering relate to model performance?\",\n",
    "    \"What is the connection between privacy and ethical data science?\",\n",
    "    \"How do edge analytics and IoT relate to each other?\"\n",
    "]\n",
    "\n",
    "# Function to execute extraction for a list of questions\n",
    "def execute_extraction(questions, data):\n",
    "    results = []\n",
    "    for question in questions:\n",
    "        answer = extractor([(question, question, question, False)], data)\n",
    "        results.append((question, answer[0][1] if answer else \"No answer found\"))\n",
    "    return results\n",
    "\n",
    "# Test keyword-based questions\n",
    "print(\"\\n--- KEYWORD-BASED QUESTIONS ---\")\n",
    "keyword_results = execute_extraction(keyword_questions, paragraphs)\n",
    "for question, answer in keyword_results:\n",
    "    print(f\"Q: {question}\")\n",
    "    print(f\"A: {answer}\")\n",
    "    print()\n",
    "\n",
    "# Test similarity-based questions\n",
    "print(\"\\n--- SIMILARITY-BASED QUESTIONS ---\")\n",
    "similarity_results = execute_extraction(similarity_questions, paragraphs)\n",
    "for question, answer in similarity_results:\n",
    "    print(f\"Q: {question}\")\n",
    "    print(f\"A: {answer}\")\n",
    "    print()\n",
    "\n",
    "# Test relationship-based questions\n",
    "print(\"\\n--- RELATIONSHIP-BASED QUESTIONS ---\")\n",
    "relationship_results = execute_extraction(relationship_questions, paragraphs)\n",
    "for question, answer in relationship_results:\n",
    "    print(f\"Q: {question}\")\n",
    "    print(f\"A: {answer}\")\n",
    "    print()\n",
    "\n",
    "# Test a complex multi-part question\n",
    "complex_question = \"What are the ethical considerations in data science and how do they impact model deployment?\"\n",
    "print(\"\\n--- COMPLEX QUESTION ---\")\n",
    "complex_result = extractor([(complex_question, complex_question, complex_question, False)], paragraphs)\n",
    "print(f\"Q: {complex_question}\")\n",
    "print(f\"A: {complex_result[0][1] if complex_result else 'No answer found'}\")\n",
    "print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "bc3bac16",
   "metadata": {},
   "outputs": [],
   "source": [
    "def format_graph_context_simple(graph, top_n=10):\n",
    "    \"\"\"\n",
    "    Format graph results with minimal structure.\n",
    "    \n",
    "    Args:\n",
    "        graph: txtai graph object\n",
    "        top_n: number of top central nodes to include\n",
    "        \n",
    "    Returns:\n",
    "        Formatted string suitable for use as context\n",
    "    \"\"\"\n",
    "    # Get centrality scores\n",
    "    centrality_scores = graph.centrality()\n",
    "    \n",
    "    # Get top N nodes by centrality\n",
    "    top_nodes = sorted(centrality_scores.keys(), \n",
    "                       key=lambda x: centrality_scores[x], \n",
    "                       reverse=True)[:top_n]\n",
    "    \n",
    "    # Format with minimal structure\n",
    "    context_parts = []\n",
    "    \n",
    "    for i, node_id in enumerate(top_nodes, 1):\n",
    "        node = graph.node(node_id)\n",
    "        text = node[\"text\"]\n",
    "        topic = node.get('topic', f'Concept {i}')\n",
    "        \n",
    "        # Add a simple header for each concept\n",
    "        context_parts.append(f\"# {topic}\")\n",
    "        context_parts.append(text)\n",
    "        context_parts.append(\"\")  # Empty line as separator\n",
    "    \n",
    "    return \"\\n\".join(context_parts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef5dc595",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- GRAPH-BASED SEARCH ---\")\n",
    "print(\"\\n--- RELATIONSHIP QUESTIONS GRAPH SEARCH ---\")\n",
    "for question in relationship_questions:\n",
    "    print(\"Q:\" + question)\n",
    "    graph = embeddings.search(question, limit=5, graph=True)\n",
    "    print(format_graph_context_simple(graph, top_n=10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "536a65c0",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "jupyterlab",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
